{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ecff8312",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/E1-NL2SQL%20for%20big%20Databases/Select_hs_Tables.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1c0362a-d133-4a1b-8161-75b4f53e6bea",
   "metadata": {
    "id": "c1c0362a-d133-4a1b-8161-75b4f53e6bea"
   },
   "source": [
    "<div align=\"center\">\n",
    "<h1><a href=\"https://github.com/peremartra/Large-Language-Model-Notebooks-Course\">Learn by Doing LLM Projects</a></h1>\n",
    "    <h3>Understand And Apply Large Language Models</h3>\n",
    "    <h2>Architecting a NL2SQL Solution for inmense Enterprise Databases</h2>\n",
    "    <h3>Testig Selection Tables to Create SQL</h3>\n",
    "    by <b>Pere Martra</b>\n",
    "</div>\n",
    "\n",
    "<br>\n",
    "\n",
    "<div align=\"center\">\n",
    "    &nbsp;\n",
    "    <a target=\"_blank\" href=\"https://www.linkedin.com/in/pere-martra/\"><img src=\"https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social\"></a>\n",
    "    \n",
    "</div>\n",
    "\n",
    "<br>\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86bc2813-763c-49f2-9f5d-e1f87a20556f",
   "metadata": {
    "id": "86bc2813-763c-49f2-9f5d-e1f87a20556f"
   },
   "source": [
    "In This notebook we are going to test if using just the name of the table, and a shord definition of its contect we can use a Model like GTP3.5-Turbo to select which tables are necessary to create a SQL Order to answer the user petition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b571dfeb-97c9-44a5-aa3b-fab008ac51aa",
   "metadata": {
    "id": "b571dfeb-97c9-44a5-aa3b-fab008ac51aa",
    "outputId": "67998bbd-c402-4444-91d9-75c5889e566b"
   },
   "outputs": [],
   "source": [
    "# Install OpenAI API.\n",
    "!pip install openai==1.1.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7fbbdf5b-4d7f-4496-8f40-9d15ea46d023",
   "metadata": {
    "id": "7fbbdf5b-4d7f-4496-8f40-9d15ea46d023"
   },
   "outputs": [],
   "source": [
    "import openai\n",
    "openai.api_key='your-api-key'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e480bbfb-9a80-4ea6-b792-067e63ae3148",
   "metadata": {
    "id": "e480bbfb-9a80-4ea6-b792-067e63ae3148"
   },
   "outputs": [],
   "source": [
    "#Functio to call the model.\n",
    "def return_OAI(user_message):\n",
    "    context = []\n",
    "    context.append({'role':'system', \"content\": user_message})\n",
    "\n",
    "    response = openai.chat.completions.create(\n",
    "            model=\"gpt-3.5-turbo\",\n",
    "            messages=context,\n",
    "            temperature=0,\n",
    "        )\n",
    "\n",
    "    return (response.choices[0].message.content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d5d2bdc-d822-4ed8-815e-b3f223730f15",
   "metadata": {
    "id": "6d5d2bdc-d822-4ed8-815e-b3f223730f15",
    "outputId": "61068bf0-41e3-40d9-b453-b76da5b0f086"
   },
   "outputs": [],
   "source": [
    "#Definition of the tables.\n",
    "import pandas as pd\n",
    "\n",
    "# Table and definitions sample\n",
    "data = {'table': ['employees', 'salary', 'studies'],\n",
    "        'definition': ['Employee information, name...',\n",
    "                       'Salary details for each year',\n",
    "                       'Educational studies, name of the institution, type of studies, level']}\n",
    "df = pd.DataFrame(data)\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "009e681c-d95d-4c9c-b044-5bfd76e3ad95",
   "metadata": {
    "id": "009e681c-d95d-4c9c-b044-5bfd76e3ad95"
   },
   "outputs": [],
   "source": [
    "text_tables = '\\n'.join([f\"{row['table']}: {row['definition']}\" for index, row in df.iterrows()])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe03ac25-8d02-4cd1-99a1-be4220c6fd2d",
   "metadata": {
    "id": "fe03ac25-8d02-4cd1-99a1-be4220c6fd2d",
    "outputId": "c1f3aab1-5f26-48fe-fcf9-3780120f5aad"
   },
   "outputs": [],
   "source": [
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7e275ae-f20d-4134-b9b6-d8677dfb544c",
   "metadata": {
    "id": "c7e275ae-f20d-4134-b9b6-d8677dfb544c"
   },
   "outputs": [],
   "source": [
    "prompt_question_tables = \"\"\"\n",
    "Given the following tables and their content definitions,\n",
    "###Tables\n",
    "{tables}\n",
    "\n",
    "Tell me which tables would be necessary to query with SQL to address the user's question below.\n",
    "Return the table names in a json format.\n",
    "###User Questyion:\n",
    "{question}\n",
    "\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1cb5957-2df2-4e5e-9e6a-ace955c9817e",
   "metadata": {
    "id": "b1cb5957-2df2-4e5e-9e6a-ace955c9817e"
   },
   "outputs": [],
   "source": [
    "#Creating the prompt, with the user questions and the tables definitions.\n",
    "pqt1 = prompt_question_tables.format(tables=text_tables, question=\"Return The name of the best paid employee\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10d30f2b-6c23-4fd6-8038-840fba784cce",
   "metadata": {
    "id": "10d30f2b-6c23-4fd6-8038-840fba784cce",
    "outputId": "9924022c-7b2b-4ec8-e2c2-75bc1c745151"
   },
   "outputs": [],
   "source": [
    "print(return_OAI(pqt1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57e07083-be8f-4cd0-95bd-c4b909422c6b",
   "metadata": {
    "id": "57e07083-be8f-4cd0-95bd-c4b909422c6b"
   },
   "outputs": [],
   "source": [
    "pqt3 = prompt_question_tables.format(tables=text_tables,\n",
    "                                     question=\"Return the Education Institution with a higher average salary\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0eeb79e-caf1-4f48-9897-168d95d2ae37",
   "metadata": {
    "id": "a0eeb79e-caf1-4f48-9897-168d95d2ae37",
    "outputId": "81d77115-9cad-4284-a228-5368bb9aa6fb"
   },
   "outputs": [],
   "source": [
    "print(return_OAI(pqt3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "321bb9a2-4937-4e9a-a31b-7049cb8f5aa3",
   "metadata": {
    "id": "321bb9a2-4937-4e9a-a31b-7049cb8f5aa3"
   },
   "source": [
    "# Conclusions\n",
    "It is clear that GPT-3.5-Turbo is a model entirely capable of deciding which tables should be used in creating an SQL query.\n",
    "\n",
    "In a more complex system, we may need to refine the definition and conduct several tests before finalizing them.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1269aaf2-e203-44da-b3cb-80bf538d67a3",
   "metadata": {
    "id": "1269aaf2-e203-44da-b3cb-80bf538d67a3"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
